Genomics Data Repositories
Rockefeller University, Bioinformatics Resource Centre
https://rockefelleruniversity.github.io/Genomic_Data/
Data Repositories
Getting hold of HTS data
From public repositories.
From collaborators.
By sequencing some of your own material!
Repositories for HTS
Public Repositories for HTS
Several public sources of HTS data exist.
First concentrating on those acting as repositories.
GEO (Gene Expression Omnibus).
ENA (European Nucleotide Database).
SRA (Short Read Archive).
Gene Expression Omnibus
GEO (
https://www.ncbi.nlm.nih.gov/geo/
)
GEO holds different types of biological datasets.
Very popular for submission of data accompanying publication.
Captures metadata, processed files and raw data.
GEO was not built for HTS data.
Gene Expression Omnibus
GEO (
https://www.ncbi.nlm.nih.gov/geo/
)
Short Read Archive
SRA (www.ncbi.nlm.nih.gov/sra)
NCBI’s HTS specific repository.
Sequencing specific metadata.
Stores Raw data (in SRA format)
SRA format - requires SRA Toolkit
Short Read Archive
SRA (www.ncbi.nlm.nih.gov/sra)
European Nucleotide Archive
ENA (
https://www.ebi.ac.uk/ena
)
ENA acts as a european HTS repository.
Mirrors much of SRA.
Stores Raw data
No SRA formats - fastq by default.
Other Repositories
Many repositories contain processed or unprocessed data.
These typically are the result or a consortium’s data release policies.
Good example is ENCODE site.
(
https://www.encodeproject.org/
)
UCSC has many useful links to genomics data in various formats.
(
http://hgdownload.soe.ucsc.edu/downloads.html
)
ENCODE Portal
ENCODE portal provides access to raw and processed/standardised results.
Repositories for processed data
Other specialist repositories exist.
ReCount2
database provides standardised counts for user analysis.
Other databases like Immgen/Bodymap/
expression atlas
provide RNAseq for specific cells/tissues.
Reference data
Reference Genome available from many locations.
Different assemblies.
Major Revisisons - Change locations.
Minor Revisions - Update annotation.
Genome sequence stored as FASTA.
Gene build as GFF3 or GTF.
IGenomes
contains full annotation files for many genomes.